The Domain Dependence of Parsing

نویسنده

Satoshi Sekine

چکیده

A major concern in corpus based approaches is that the applicability of the acquired knowledge may be limited by some feature of the corpus, in particular, the notion of text 'domain' . In order to examine the domain dependence of parsing, in this paper, we report 1) Comparison of structure distributions across domains; 2) Examples of domain specific structures; and 3) Parsing experiment using some domain dependent grammars. The observations using the Brown corpus demonstrate domain dependence and idiosyncrasy of syntactic structure. The parsing results show that the best accuracy is obtained using the grammar acquired from the same domain or the same class (fiction or nonfiction). We will also discuss the relationship between parsing accuracy and the size of training corpus. 1 I n t r o d u c t i o n A major concern in corpus based approaches is that the applicability of the acquired knowledge may be limited by some feature of the corpus. In particular, the notion of text 'domain' has been seen as a major constraint on the applicability of the knowledge. This is a crucial issue for most application systems, since most systems operate within a specific domain and we are generally limited in the corpora available in that domain. There has been considerable research in this area (Kittredge and Hirschman, 1983) (Grishman and Kittredge, 1986). For example, the domain dependence of lexical semantics is widely known. It is easy to observe that usage of the word 'bank' is different between the 'economic document ' domain and the 'geographic' domain. Also, there are surveys of domain dependencies concerning syntax or syntaxrelated features (Slocum, 1986)(niber , 1993)(Karlgren, 1994). It is intuitively conceivable that there are syntactic differences between 'telegraphic messages' and 'press report ' , or between 'weather forecast sentences' and 'romance and love story'. But, how about the difference between 'press report ' and 'romance and love s tory '? Is there a general and simple method to compare domains? More importantly, shall we prepare different knowledge for these two domain sets? In this paper, we describe two observations and an experiment which suggest an answer to the questions. Among the several types of linguistic knowledge, we are interested in parsing, the essential component of many NLP systems, and hence domain dependencies of syntactic knowledge. The observations and an experiment are the following: • Comparison of structure distributions across domains • Examples of domain specific structures • Parsing experiment using some domain dependent grammars 2 D a t a a n d T o o l s The definition of domain will dominate the performance of our experiments, so it is very important to choose a proper corpus. However, for practical reasons (availability and time constraint), we decided to use an existing multi-domain corpus which has naturally acceptable domain definition. In order to acquire grammar rules in our experiment, we need a syntactically tagged corpus consisting of different domains, and the tagging has to be uniform throughout the corpus. To meet these requirements, the Brown Corpus (Francis and Kucera, 1964) on the distribution of PennTreeBank version 1 (Marcus et.al., 1995) is used in our experiments. The corpus consists of 15

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی مقایسه‌ای تأثیر برچسب‌زنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی

In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

Expert Discovery: A web mining approach

Expert discovery is a quest in search of finding an answer to a question: “Who is the best expert of a specific subject in a particular domain within peculiar array of parameters?” Expert with domain knowledge in any field is crucial for consulting in industry, academia and scientific community. Aim of this study is to address the issues for expert-finding task in real-world community. Collabor...

متن کامل

Any Domain Parsing - Automatic Domain Adaptation for Natural Language Parsing

Current efforts in syntactic parsing are largely data-driven. These methods require labeled examples of syntactic structures to learn statistical patterns governing these structures. Labeled data typically requires expert annotators which makes it both time consuming and costly to produce. Furthermore, once training data has been created for one textual domain, portability to similar domains is...

متن کامل